12

Introduction

FIGURE 1.7

The generation of CiF.

Martinez et al. [168] attempt to minimize the discrepancy between the binary output and

the corresponding real-valued convolution. They proposed real-to-binary attention matching

suited for training 1-bit CNNs. They also devised an approach in which the architectural gap

between real and binary networks is progressively bridged through a sequence of teacher-

student pairs.

Instead of using a pre-trained full-precision model, Bethge et al. [11] directly train a

binary network from scratch, which does not benefit from other standard methods. Their

implementation is based on the BMXNet framework [268].

Helwegen et al. [85] believe that latent weights with real values cannot be treated anal-

ogously to weights in real-valued networks, while their primary role is to provide inertia

during training. They introduced the Binary Optimizer (Bop), the first optimizer designed

for BNNs.

BinaryDuo [115] proposes a new training scheme for binary activation networks in which

two binary activations are coupled into a ternary activation during training. They first

decouple a ternary activation into two binary activations. Then the number of weights is

doubled after decoupling. They reduce the coupled ternary model to match the parameter

size of the decoupled model and the baseline model. They update each weight independently

to find a better value since the two weights no longer need to share the same value.

BENN [301] uses classical ensemble methods to improve the performance of 1-bit CNNs.

While ensemble techniques have been broadly believed to be only marginally helpful for

strong classifiers, such as deep neural networks, their analysis, and experiments show that

they are naturally a perfect fit to boost BNNs. The main uses of the ensemble strategies

are shown in [19, 32, 184].

TentacleNet [173] is also inspired by the theory of ensemble learning. Compared to

BENN [301], TentacleNet takes a step forward, showing that binary ensembles can reach

high accuracy with fewer resources.

BayesBiNN [170] uses a distribution over the binary variable, resulting in a principled

approach to discrete optimization. They used a Bernoulli approximation to the posterior

and estimated it using the Bayesian learning rule proposed in [112].

1.2

Applications

The success of BNNs makes it possible to apply deep learning models to edge computing.

Neural network models have been used in various real tasks with the help of these binary

methods, including image classification, image classification, speech recognition, and object

detection and tracking.